Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator
نویسندگان
چکیده
We examine the Xeon Phi, which is based on Intel’s Many Integrated Cores architecture, for its suitability to run the FDK algorithm—the most commonly used algorithm to perform the 3D image reconstruction in cone-beam computed tomography. We study the challenges of efficiently parallelizing the application and means to enable sensible data sharing between threads despite the lack of a shared last level cache. Apart from parallelization, SIMD vectorization is critical for good performance on the Xeon Phi; we perform various micro-benchmarks to investigate the platform’s new set of vector instructions and put a special emphasis on the newly introduced vector gather capability. We refine a previous performance model for the application and adapt it for the Xeon Phi to validate the performance of our optimized hand-written assembly implementation, as well as the performance of several different auto-vectorization approaches.
منابع مشابه
Modern Platform for Parallel Algorithms Testing: Java on Intel Xeon Phi
Parallel algorithms are popular method of increasing system performance. Apart from showing their properties using asymptotic analysis, proof-of-concept implementation and practical experiments are often required. In order to speed up the development and provide simple and easily accessible testing environment that enables execution of reliable experiments, the paper proposes a platform with mu...
متن کاملAnalysis of the Execution - Time Variation of OpenMP - based Applications on the Intel R © Xeon Phi TM
The Intel © Xeon Phi accelerator is currently being used in several large-scale computer clusters and supercomputers to enhance the execution-time performance of computation-intensive applications. While performing a comprehensive profiling of the Intel © Xeon Phi execution-time behavior of different applications included in the Rodinia Benchmark suite, we observed large variations in applicati...
متن کاملUnderstanding the Costs of Many-Task Computing Workloads on Intel Xeon Phi Coprocessors
Many-Task Computing (MTC) aims to bridge the gap between HPC and HTC. MTC emphasizes running many computational tasks over a short period of time, where tasks can be either dependent or independent of one another. MTC has been well supported on Clouds, Grids, and Supercomputers on traditional computing architectures, but the abundance of hybrid large-scale systems using accelerators has motivat...
متن کاملA Performance and Scalability Analysis of the Tsunami Simulation EasyWave for Different Multi-Core Architectures and Programming Models
In this paper, the performance and scalability of different multi-core systems is experimentally evaluated for the Tsunami simulation EasyWave. The target platforms include a standard Ivy Bridge Xeon processor, an Intel Xeon Phi accelerator card, and also a GPU. OpenMP, MPI and CUDA were used to parallelize the program to these platforms. The absolute performance of the application on the diffe...
متن کاملPorting FEASTFLOW to the Intel Xeon Phi: Lessons Learned
In this paper we report our experiences in porting the FEASTFLOW software infrastructure to the Intel Xeon Phi coprocessor. Our efforts involved both the evaluation of programming models including OpenCL, POSIX threads and OpenMP and typical optimization strategies like parallelization and vectorization. Since the straightforward porting process of the already existing OpenCL version of the cod...
متن کامل